Skip to content
This repository was archived by the owner on Sep 11, 2020. It is now read-only.

Conversation

@JoshuaSjoding
Copy link
Contributor

This update primarily impacts iteration of Tree entries and the processing of Tree.Files().

Instead of returning a channel of files, Tree.Files() now returns a FileIter. It has the following benefits:

  • It returns files in the original order of the repository (relying on a new Tree.OrderedNames property)
  • It can return errors encountered when retrieving files and trees from underlying storage
  • It can be Closed without having to drain the entire channel
  • It defers the heavy lifting to a new TreeWalker type
  • Its behavior is a little more consistent with other Iter types
  • It's a little less prone to memory leaks

It has the following downsides:

  • Using it is slightly more cumbersome than simply ranging over a channel
  • It exposes a Close function, which means bothering with another Close() call
  • The TreeWalker implementation it relies upon is slightly harder to reason about than the closure-based version

This update includes a new TreeWalker type that will iterate through all of the entries of a tree and its descendant subtrees. It does the dirty work that Tree.walkEntries() used to do, but with a public API.

A new TreeIter type is also included that just walks through subtrees. This could be useful for performing a directory search while ignoring files/blobs altogether.

@codecov-io
Copy link

Current coverage is 60.35%

Merging #30 into master will increase coverage by +1.84% as of c190b44

@@            master     #30   diff @@
======================================
  Files           24      27     +3
  Stmts         1567    1657    +90
  Branches       200     216    +16
  Methods          0       0       
======================================
+ Hit            917    1000    +83
- Partial         93     101     +8
+ Missed         557     556     -1

Review entire Coverage Diff as of c190b44

Powered by Codecov. Updated on successful CI builds.

@mcuadros
Copy link
Contributor

I need to review it, but on the meanwhile can you squash the commits?

Instead of returning a channel of files, Tree.Files() now returns a
FileIter with these qualities:

* It returns files in the original order of the repository (relying on a
* new Tree.OrderedNames property)
* It can return errors encountered when retrieving files and trees from
* underlying storage
* It can be Closed without having to drain the entire channel
* It defers the heavy lifting to a new TreeWalker type
* Its behavior is a little more consistent with other Iter types
* It's a little less prone to memory leaks

This update includes a new TreeWalker type that will iterate through all
of the entries of a tree and its descendant subtrees. It does the dirty
work that Tree.walkEntries() used to do, but with a public API.

A new TreeIter type is also included that just walks through subtrees.
This could be useful for performing a directory search while ignoring
files/blobs altogether.
@JoshuaSjoding
Copy link
Contributor Author

@mcuadros Commits are now squashed.

tree.go Outdated
Entries map[string]TreeEntry
Hash core.Hash
Entries map[string]TreeEntry
OrderedNames []string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is really need it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought it would be useful and appropriate to keep the entries ordered. I already found it useful for writing tests, and it could make Tree comparisons more efficient when we can rely on a known order.

The hash of the tree is already dependent on its order, and that order generally seems well defined. If a tree is recreated with the same entries in a different order I expect that it would produce a different hash. Users of go-git might be surprised to get the files back in a weird order when the order is well defined by the tree itself.

Relying on the map here meant that the resulting order would always be random. Users could sort it again to put it back in order, but that seems wasteful. Also, golang's default lexicographic sort algorithm is different than what I'm seeing in most git repositories.

The implementation under review here does add some overhead and I actually don't like it very much. What I'd really like to do is this:

  1. Change the definition of Tree.Entries to []TreeEntry
  2. Add a Tree.Map property as map[string]*TreeEntry
  3. Build Tree.Map lazily: only on the first call to Tree.File() for example (it basically functions as a cache)
  4. Update TreeWalker to simply iterate through Tree.Entries (it wouldn't touch Tree.Map)

Making such a change would have these advantages:

  • The implementation would be cleaner than the current code under review
  • Users of Tree that don't need random lookups would never bother generating a map (a performance improvement)

I was hesitant to do this before because it's more invasive, but I'd be happy to update this PR with the changes described above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Making Tree.Entries a slice ended up being pretty straightforward so I went ahead and added the change to this PR.

Tree's mapping of names to entries has been made internal, and will only be built when necessary with the first call to Tree.File().
mcuadros added a commit that referenced this pull request Feb 17, 2016
Improved consistency of Tree iterators
@mcuadros mcuadros merged commit 33dada7 into src-d:master Feb 17, 2016
@JoshuaSjoding JoshuaSjoding deleted the consistent-iterators branch June 24, 2016 09:10
mcuadros added a commit that referenced this pull request Jan 31, 2017
Improved consistency of Tree iterators
gsalingu-ovhus pushed a commit to gsalingu-ovhus/go-git that referenced this pull request Mar 28, 2019
Add missing api path in README
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants